119 results found.
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
English Standard Arabic
Availability:
Freely Available
License:
Gnu
Size:
120000 tokens Production Status:
Existing-used
Use:
Annotation and Educational
-
Paper title:LAMP: A Multimodal Web Platform for Collaborative Linguistic Analysis
-
Paper track:Multimodality
-
Paper status:Accept Poster+Demo
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Kais Dukes | University of Leeds | None |
| Author 2 | Eric Atwell | University of Leeds | None |
| Main Contact | Kais Dukes | University of Leeds | GB |
Documentation:
http://corpus.quran.com/documentationLanguage Type:
Multilingual
Languages:
American English German Korean Mandarin Chinese Standard Arabic
Availability:
tbd
License:
TBD
Size:
9000 sentences Production Status:
Newly created-in progress
Use:
Natural Language Generation
-
Paper title:A Database for Measuring Linguistic Information Content
-
Paper track:Infrastructural Issues/Large Projects
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country | ||
|---|---|---|---|---|---|
| Author 1 | Richard Sproat | US | |||
| Author 2 | Bruno Cartoni | US | |||
| Author 3 | HyunJeong Choe | KR | |||
| Author 4 | David Huynh | US | |||
| Author 5 | Linne Ha | US | |||
| Author 6 | Ravindran Rajakumar | US | |||
| Author 7 | Evelyn Wenzel-Grondie | US | |||
| Main Contact | Richard Sproat | None | None | None |
Documentation:
TBDLanguage Type:
Multilingual
Languages:
English Iranian Persian Standard Arabic Urd
Availability:
<Not Specified>
License:
<Not Specified>
Size:
Very big OtherProduction Status:
Collected so far for a year but will continue the collection till the start of the LREC conference
Use:
Corpus Creation/Annotation
-
Paper title:Creation of comparable corpora for English-{Urdu, Arabic, Persian}
-
Paper track:Written
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Murad Abouammoh | King Saud University | SA |
| Author 2 | Kashif Shah | University of Sheffield | GB |
| Author 3 | Ahmet Aker | University of Sheffield | GB |
| Main Contact | Ahmet Aker | University of Sheffield | None |
Documentation:
<Not Specified>
Written
Corpus,
Language Type:
Multilingual
Languages:
Ancient Greek Basque Croatian Standard Arabic
Availability:
Freely Available
License:
CC BY 4.0, CC BY-SA 3.0, CC BY-SA 4.0, CC BY-NC-SA 2.5, CC BY-NC-SA 3.0 US, CC BY-NC-SA 3.0, CC BY-NC-SA 4.0, GNU GPL 2.0, GNU GPL 3.0
Size:
7529911 words Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing
-
Paper track:Written
-
Paper status:Accept Poster+Demo
| Author Number | Name | Affiliation | Country | ||
|---|---|---|---|---|---|
| Author 1 | Milan Straka | Charles University in Prague | None | Charles University | None |
| Author 2 | Jan Hajic | Charles University in Prague | CZ | ||
| Author 3 | Jana Straková | Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics | CZ | ||
| Main Contact | Milan Straka | Charles University in Prague | None | Charles University | None |
Documentation:
http://universaldependencies.org/
Written
Treebank,
Language Type:
Multilingual
Languages:
Basque Bulgarian Croatian Standard Arabic
Availability:
Freely Available
License:
Various licenses but mostly CreativeCommons
Size:
250 000 sentences Production Status:
Newly created-in progress
Use:
Parsing and Tagging
-
Paper title:Universal Dependencies v1: A Multilingual Treebank Collection
-
Paper track:Written
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Joakim Nivre | Uppsala University | SE |
| Author 10 | Natalia Silveira | Stanford University | US |
| Author 11 | Reut Tsarfaty | Open University of Israel | IL |
| Author 12 | Daniel Zeman | Charles University in Prague, Faculty of Mathematics and Physics | CZ |
| Author 2 | Marie-Catherine de Marneffe | The Ohio State University | US |
| Author 3 | Filip Ginter | University of Turku | FI |
| Author 4 | Yoav Goldberg | Bar Ilan University | IL |
| Author 5 | Jan Hajic | Charles University in Prague | CZ |
| Author 6 | Christopher D. Manning | Stanford University | US |
| Author 7 | Ryan McDonald | US | |
| Author 8 | Slav Petrov | US | |
| Author 9 | Sampo Pyysalo | University of Cambridge | GB |
| Main Contact | Joakim Nivre | Uppsala University | None |
Documentation:
Documentation is available in English on the website
Written
Corpus,
Language Type:
Multilingual
Languages:
Danish Dutch Finnish Mandarin Chinese Standard Arabic
Availability:
Freely Available
License:
<Not Specified>
Size:
4.2 MByte Production Status:
Newly created-finished
Use:
Person Identification
-
Paper title:Creating and Curating a Cross-Language Person-Entity Linking Collection
-
Paper track:Evaluation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Dawn Lawrie | Loyola College in Maryland | None |
| Author 2 | James Mayfield | Johns Hopkins University | None |
| Author 3 | Paul McNamee | Johns Hopkins University | None |
| Author 4 | Douglas Oard | University of Maryland | None |
| Main Contact | Dawn Lawrie | Loyola University Maryland | US |
Documentation:
Documentation in English with Download
Written
Corpus,
Language Type:
Multilingual
Languages:
Bengali Bulgarian Catalan Czech Standard Arabic
Availability:
Freely Available
License:
CreativeCommons, Gnu
Size:
1 GByte Production Status:
Existing-used
Use:
Evaluation/Validation
-
Paper title:If You Even Don't Have a Bit of Bible: Learning Delexicalized POS Taggers
-
Paper track:Written
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country | ||
|---|---|---|---|---|---|
| Author 1 | Zhiwei Yu | Shanghai Jiaotong University | CN | ||
| Author 2 | David Mareček | Charles University in Prague | CZ | ||
| Author 3 | Zdeněk Žabokrtský | Charles University in Prague, Faculty of Mathematics and Physics | CZ | Charles University in Prague, Institute of Formal and Applied Linguistics | None |
| Author 4 | Daniel Zeman | Charles University in Prague, Faculty of Mathematics and Physics | CZ | ||
| Main Contact | Daniel Zeman | Charles University in Prague, Faculty of Mathematics and Physics | None | Charles University, Faculty of Mathematics and Physics | None |
Documentation:
http://ufal.mff.cuni.cz/hamledtLanguage Type:
Multilingual
Languages:
English Farsi Spanish Standard Arabic Tamazight, Central Atlas
Availability:
Freely Available
License:
<Not Specified>
Size:
76 MByte Production Status:
Newly created-finished
Use:
Acquisition
-
Paper title:TLAXCALA: a multilingual corpus of independent news
-
Paper track:Written
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Antonio Toral | Dublin City Unversity | NL |
| Main Contact | Antonio Toral | University of Groningen | None |
Documentation:
<Not Specified>Language Type:
Multilingual
Languages:
Bulgarian German Greek Standard Arabic
Availability:
Freely Available
License:
CreativeCommons
Size:
40 GByte Production Status:
Newly created-finished
Use:
Paraphrasing, coverage expansion
-
Paper title:The Multilingual Paraphrase Database
-
Paper track:Written
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Juri Ganitkevitch | Johns Hopkins University | US |
| Author 2 | Chris Callison-Burch | University of Pennsylvania | US |
| Main Contact | Juri Ganitkevitch | Johns Hopkins University | None |
Documentation:
<Not Specified>Language Type:
Multilingual
Languages:
Egyptian Arabic English North Levantine Arabic Standard Arabic
Availability:
From Owner
License:
<Not Specified>
Size:
236913 words Production Status:
Newly created-in progress
Use:
Emotion Recognition/Generation
-
Paper title:SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis
-
Paper track:Evaluation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Muhammad Abdul-Mageed | Indiana University | CA |
| Author 2 | Mona Diab | GWU | US |
| Main Contact | Muhammad Abdul-Mageed | The University of British Columbia | None |
Documentation:
<Not Specified>




